Death and Lightness: 
Using a Demographic Model to Find Support Verbs 
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Abstract 

Some verbs have a particular kind of binary 
ambiguity: they can carry their normal, 
full meaning, or they can be merely acting 
as a prop for the nominal object. It has 
been suggested that there is a detectable 
pattern in the relationship between a verb 
acting as a prop (a SUPPORT verb) and 
the noun it supports. 

The task this paper undertakes is to de- 
velop a model which identifies the support 
verb for a particular noun, and by exten- 
sion, when nouns are enumerated, a model 
which disambiguates a verb with respect to 
its support status. The paper sets up a ba- 
sic model as a standard for comparison; it 
then proposes a more complex model, and 
gives some results to support the model's 
validity, comparing it with other similar 
approaches. 

1 Introduction 

It is well-known that some verbs have a binary ambi- 
guity: consider the sentences Kim took a photograph 
of Dale and Kim took a painting of Dale. The for- 
mer can be a paraphrase of Kim photographed Dale 
while the latter has no such paraphrase. This is 
because, in one reading of the first example, the lex- 
eme take is only acting as a prop or support for a 
content-bearing noun, a capacity first noted by Jes- 
persen (1942); while in the second example the verb 
has only its full meaning of gain possession. It has 
been noted that many support verbs (SVs), like take 
and make — as in make a distinction (equivalent to 
distinguish) — are quite productive, being able to act 
as support verbs for a number of different nouns; 
and investigations have suggested that there may be 
a pattern in the relationship between SV and noun 
(e.g. Makkai, 1977; Wierzbicka, 1982). Discovering 
the correspond! ence between SV and noun can help 
disambiguate the verb, deciding whether it is acting 
as an SV or not; that is the aim of this paper. 



The concept of ambiguity here is similar to that 
of word-level sense ambiguity (see, for example, 
Yarowsky, 1992), rather than to higher- level ambi- 
guity such as that of garden-path sentences (Gibson, 
1995). Because an SV by itself does not represent 
a separate concept — the whole construction take a 
walk represents a single concept of walking — while a 
full verb can, removing this type of ambiguity from 
input is particularly important for determining map- 
pings in areas where input text is translated into an- 
other form, such as to text in another language, in 
machine translation (Danlos and Samvelian, 1992), 
or to a meaning representation (Meteer, 1991); it is 
also useful for dealing with multiword constructions 
in language, like idioms (Abeille, 1988; Storrer and 
Schwall, 1993). 

More generally, identifying a verb as an SV indi- 
cates its lack of propositional content, and so can 
contribute to more accurate readability measures, 
such as lexical density (described in Halliday, 1985). 
Knowing whether a word lacks content is similarly 
important in the area of information retrieval, in 
explicitly constructing stoplists (Salton, 1988), on 
the assumption that content-free words should be 
deleted from the search space of key terms. These 
lists can be made more comprehensive by recog- 
nising the similar lack of content in SVs and the 
closed-class words, which are traditionally consid- 
ered to comprise the set of content-free words (Hall- 
iday, 1985). Another area of potential use is in style 
checking, where it is generally recommended that 
SVs be removed for reasons of clarity (Kane, 1983); 
for example, [la] becomes [lb]. 

la. It is important for teachers to have a knowledge 
of their students. 

lb. It is important for teachers to know their stu- 
dents. 

Characterisation and identification techniques for 
SVs have used both purely semantic methods and 
more syntactic, surface-based ones. An example 
of the former is given in Wierzbicka's 1982 paper, 
where a set of semantic rules is proposed to de- 



tcrmine the SV that corresponds to a particular 
noun, concentrating on explanations of phenomena 
like why one can have a drink but not *have an eat. 
Her analysis of these phenomena leads to rules like: 

2. The support verb is have if the nominalisa- 
tion represents an action aiming at a percep- 
tion which could cause one to know something 
and which would not cause one to feel bad if it 
didn't. 

The surface-based approaches aim to overcome the 
laborious nature of determining such semantic rules 
by assuming that the syntactic structure reflects 
enough of the semantics to make a surface sta- 
tistical analysis possible. Fontencllc (1993) pro- 
posed a surface-based approach, which uses the work 
of Smadja (1991) on collocation relations in text. 
His method, however, requires multi-lingual ma- 
chine readable dictionaries, which may not be read- 
ily available, and the prior division of words into sets 
according to Mel'cuk's Meaning- Text Theory (c.f. 
Mel'cuk and Zholkovsky, 1988; Steele, 1990). ' 

A more recent example of the surface-based ap- 
proach is the statistical technique proposed by 
Grefenstette and Teufel (1995). A statistical anal- 
ysis sounds intuitively plausible, given that it has 
been suggested (Halliday, 1985) that there is a re- 
lationship between frequency of a word, a surface 
phenomenon, and its content-freeness. However, 
Grefenstette and Teufel's (1995) statistical tech- 
nique only uses frequency with respect to a par- 
ticular noun (which I call LOCAL information), 
rather than any more general notion of frequency. 
So, for example, in identifying an SV for demand, 
Grefenstette and Teufel look only at the frequency 
of co-occurrence of various verbs with demand. As 
a result, in their system meet is chosen as the corre- 
sponding SV as it is the most frequently co-occurring 
verb; make, the actual SV, is ranked lower. How- 
ever, knowing that make is a generally productive 
SV would lead it to be a more obvious candidate, 
despite its lower ranking. A probabilistic argument 
to this effect is given in section 3; the key approxima- 
tion on which it relies, the use of data with respect to 
all other SV constructions (local information), 
is drawn from a model in demographic statistics, and 
is outlined first in section 2. 

2 Mortality 

This section looks at a standardisation model 
used in mortality which is a useful one for 
SVs: it provides a way of combining in- 
formation about a subpopulation — the target 
population — with a larger population which pro- 
vides more information — the standard popula- 
tion. An overview of the theory is given below; 
more detail can be found in standard demographic 



texts such as Pollard et al (1981). The method de- 
scribed in this paper combines local and global infor- 
mation about SVs in a similar way; the correspon- 
dence will be discussed in more detail in Section 3. 



2.1 Standard populations 

The most easily obtainable mortality rate, the crude 
death rate (CDR), is calculated by dividing the to- 
tal number of deaths for a population by the total 
size of the population. However, this does not accu- 
rately reflect the mortality experienced by the pop- 
ulation: Pollard et al (1981) discuss the situation of 
Maori and non-Maori populations of New Zealand in 
1966, where the Maori population had a lower CDR 
despite having higher mortality rates for every age 
group. 

The explanation for this discrepancy comes from the 
different profiles of each population: the age cate- 
gories which experience the lowest rates have higher 
population sizes, weighting the overall population 
rate so that it also is lower. So the Maori popula- 
tion has a much higher number of young people, who 
have lower death rates; this produces a lower overall 
rate, as the CDR effectively weights the measure- 
ment by the distribution of the Maori population. 
Using a common (or standard) population is one way 
of removing this bias. 



2.2 Indirect Standardised Death Rate 
(ISDR) 

One standard demographic technique for producing 
a figure comparable between populations is to apply 
age-specific mortality rates from the standard popu- 
lation to the corresponding age brackets of the target 
population, giving the number of deaths that would 
be expected in the target population if the levels of 
mortality in the standard population were being ex- 
perienced. These expected deaths are summed, and 
used in the calculation of the standardised mortal- 
ity ratio, which is equal to actual deaths for the tar- 
get population divided by expected deaths; it repre- 
sents the degree above or below expectation to which 
deaths actually occurred. This ISDR is often the 
preferred measure of standardisation when the tar- 
get population is too small to accurately calculate 
age-specific mortality rates, using as it does those of 
the standard population in their place. 

There is no one definitive standard population for 
two given target populations. One frequently chosen 
standard population is the union of the two target 
populations: for example, when comparing Maori 
mortality with non-Maori mortality in New Zealand, 
the total New Zealand population was used as stan- 
dard. 



3 A Probabilistic Model 

This section describes a probabilistic model for the 
prediction of support verbs, along with the approxi- 
mations and assumptions being made; these are then 
justified by recourse to the demographic model de- 
scribed in section 2. 

The most likely support verb for a given nominal- 
isation is defined as that verb which has the high- 
est probability of being an SV for that nominalisa- 
tion; taking the point estimate of this probability, 
the most likely support verb for a nominalisation is 
that verb which has the highest frequency of occur- 
rence as an SV with the nominalisation. That is: 



SV(j) = argmax iev f %j (1) 

where 

• SV(j) = most likely SV for nominalisation j 

• fij = frequency with which verb i appears to be 
supporting nominalisation j 

This quantity fij is, of course, unknown, as there are 
no corpora tagged for verb lightness — that tagging 
is the purpose of the identification method proposed 
in this paper and others. Now, /y can be rewritten 
as 



% = mijPij (2) 

where 

• rriij = number of occurrences of verb i 
governing^ nominalisation j 

• pij = Pr (verb i is acting as an SV | nominali- 
sation j) 

3.1 Basic model 

Again, pij is unknown and cannot be estimated di- 
rectly. One approach is to make the admittedly in- 
accurate assumption that p^ is equal to 1 for all i 
and j. That is, the verb chosen to be the SV is sim- 
ply that one which most frequently governs the verb 
in the chosen training corpus. Then 

4 = m 2i (3) 

which gives 

SV(j') = argmax^ym^ (4) 

I use 'govern' in the sense that if X is a comple- 
ment of Y, then Y governs X; this is in line with Mel'cuk 
(1988). 



Grefenstette and Teufel (1995) effectively use this 
assumption, with the additional modification of re- 
stricting the count m%j, by only counting those oc- 
currences where the SV construction has similar 
characteristics to the equivalent full verb. So, for 
example, the preposition qualifying the noun in the 
SV construction is generally the same as the prepo- 
sition attached to the full verb (make a decision to 
decide to ...); they use this type of information, 
when collecting data, to give a more accurate rriij . 

In this paper I will only be looking at the gain that 
can be made from attempting to estimate py, so I 
will be using this definition of SV'(j) as the main 
basis for comparison. I will, however, also compare 
the results against the model of Grefenstette and 
Teufel (1995), to compare the degree of improvement 
expected of each over the basic model. 

3.2 Global Information Model 

Now, an approximation for p^ suggested by the de- 
mographic model above is to use the unconditional 
probability over all nominalisations — call this pi. So: 

£y = VOijPi (5) 

where 



p^ = Pr(verb.i.is.acting.as.an.SV) 

3 j 

The case for using such a significant approximation 
here is the same as the case for using it in the de- 
mographic model described in section 2; it is the ap- 
proximation around which the model is built. The 
use of unconditional probabilities as an approxima- 
tion to the conditional ones parallels, in the demo- 
graphic model, the use of the standard or global pop- 
ulation rates when calculating statistics on the sub- 
or target population. The correspondence between 
elements of the two models is given in Table 1. 

The reason behind using the global rates in both 
cases is similar — the local probabilities cannot ac- 
curately be estimated. A difference, however, can 
be noted. In the demographic model, the local and 
global probabilities of dying will both usually follow 
typical mortality curves: high rates at birth, declin- 
ing until late teens, an 'accident hump' of higher 
mortality, another decline, and then increasing with 
middle and old age. But, in the language model, the 
conditional and unconditional probabilities are less 
innately similar. The conditional probabilities will 
generally be more dichotomous: for a given nom- 
inalisation, a verb will either (virtually) always or 



Mortality 


Lightness 


description 


instance 


description 


instance 


target population 


Maori population 


local information (data 
for given nominalisa- 
tion, such as make) 


all verbs gov- 
erning given nominali- 
sation, such as make 


standard population 


NZ population 


global in- 
formation (data for all 
nominalisations) 


all verbs governing all 
nominalisations 


age category 


ages 15-25 


verb 


instances of make as a 
governing verb 


target population mor- 
tality rate 




conditional probability 

Pij 




standard population 
mortality rate 




unconditional probabil- 
ity pi 





Table 1: Correspondence between mortality and lightness models 



(virtually) never be an SV. The global, uncondi- 
tional probabilities will, on the other hand, be a 
more mixed distribution: for example, make may 
have an unconditional probability of being an SV of 
0.3, have, a probability of 0.23, and so on. 

It should also be noted, however, the mortality 
curves of the demographic model can actually be 
very dissimilar also — the mortality rate distribu- 
tion for the target population may lack an accident 
hump, or the probability of dying may approach 1 
much faster and earlier than in the global mortality 
distribution, producing an effect similar to that in 
the language situation described. The approxima- 
tion technique is fairly robust, to allow for this, and 
is not greatly affected by the choice of the global 
population or rates (see Pollard, 1981: 72). In any 
case, it is more accurate than the assumption of Pij 
equaling 1 : the pi are a ranking of the likelihood of a 
verb being an SV when no context is known, mean- 
ing that more likely candidates can be identified, 
whereas the basic model gives no such indication. 

Estimating these unconditional probabilities relies 
on an assumption that support verbs are produc- 
tive to some extent, an assumption which appears 
to hold true for at least the major support verbs — 
for example, make acts as an SV for attempt, crit- 
icism, decision, error, judgment, and many other 
nominalisations — with a corollary to the assump- 
tion, that other non-support verbs will not exhibit 
the same generality across nominalisations, and this 
seems to be borne out by inspection of the global 
information described in section 4. 

Then, given this assumption, the unconditional 
probabilities can be estimated by, for the purposes of 
this estimation only, treating all occurrences of verbs 
governing nominalisations in the corpus as acting as 
support verbs, and aggregating these to give the un- 
conditional probability. That is: 



3 i 3 

This can be thought of as producing a global order- 
ing of verbs, in which the rankings approximate the 
likelihood of acting as an SV because of the produc- 
tivity of support verbs, which will leading to pi and 
p\ correlating reasonably well. Productive support 
verbs will tend to govern a range of different nomi- 
nalisations and be ranked high in this ordering; their 
higher values of p\ correspond to their higher likeli- 
hood of being support verbs as measured by pi. A 
less productive SV, such as bear, will still have a low 
probability estimate p[. This approximation is not 
expected to give accurate estimates for p^, it is only 
important that it correlate with pi, as the process of 
choosing the most likely SV only requires that the 
ranking of verbs in order of the probability of be- 
ing an SV be accurate. Again, an inspection of the 
global information described in section 4 seems to 
bear this out. 

Given these approximations, the most likely SV un- 
der this model is given by: 

SV"(j) = argmax ieV rriijPi 

= argmax ieV m i: j ^ m^/ ^ ^ m,ij 

3 i 3 

= argmax ieV rriij ^ rriij (8) 

3 

4 Experimental work 

4.1 Deriving local and global information 

To gather both local and global information, the 
1992 version of Grolier's encyclopedia of approxi- 
mately 8 million words was used, tagged by the part- 
of-speech tagger developed by Brill (1993). A heuris- 
tic for producing the local information about a tar- 



get population involved searching the corpus for the 
nominalisation, determining the verb for which the 
nominalisation was the direct object, and tallying 
the relative frequency of these verbs. 

Grefenstette and Teufel (1995) note that a confound- 
ing factor in the local information, when picking out 
nominalisations and their governing verbs, is that 
the nominal may have become concretised. Gen- 
erally, nominalisations represent an abstract con- 
cept, being essentially events represented in noun 
form; but it is possible for the nominal to represent 
a physical embodiment of that concept. For exam- 
ple: 

3a. abstract: He made his formal proposal to the 
full committee. 

3b. CONCRETISED: He put the proposal in the 
drawer. 

The abstract and concretised versions will tend to 
have different governing verbs. However, if the as- 
sumption about productivity in Section 3.2 is true, 
and the global information is a good approximation 
to the innate lightness of a verb, the correct SV will 
be favoured over those associated with the concre- 
tised forms. 

4.2 Generating nominalisations for global 
information 

To construct the global information, data for all 
nominalisations is needed. A large list of nominalisa- 
tions was derived in a partially automated manner 
from Longman's Dictionary of Contemporary En- 
glish (LDOCE) using both built-in information and 
a heuristic: since a nominal is an event represented 
in noun form, the procedure used here for deriving a 
list of them involved looking for nouns with associ- 
ated STEM VERBS; e.g., decide is the stem verb of de- 
cision. Some verbs have this information encoded in 
their entries: for example, adjust lists adjustment as 
its nominalisation; there were 257 verbs in this cate- 
gory. For others, an automatic orthographic heuris- 
tic that matched nouns with verbs produced a set of 
candidates, which was manually filtered to produce 
1414 more deverbal nominalisations. 

A set of support verb constructions and their 
constituent nominalisations was drawn from a 
range of sources — see Table || and bibliography for 
references — and used as the test set for the experi- 
ment. □ The list of nominals did not cover some of 
the nominals from the test, so the local information 
was generated from the training corpus for each of 

2 These sources have assumed that the propositional 
meanings of the SV construction and the full verb are 
equivalent. This may be disputed in a number of cases, 
but for the purposes of this paper, the equivalence of the 
two meanings will be taken as indicated by the relevant 
source. 



the missing test set nominals and aggregated into 
the global information. 

4.3 The test set and results 

A system to identify support verbs for nominalisa- 
tions, based on the global information model, was 
implemented by tabulating the lcmmatised forms of 
all the verbs for which these nominals were the di- 
rect object. Candidate support verbs were ranked in 
order of values of SV" (j), and the maximum of these 
values chosen. 

The test set and results are summarised in Table ^; 
the table contains: 

• the source text; 

• the corresponding verb, which the source can 
be rewritten as; 

• the reference from which the source text was 
taken; 

• the system's first choice candidate for support 
verb CI for the source text's constituent nomi- 
nalisation (i.e. the verb category with the high- 
est expected number of light verbs, SV"(j)); 

• the system's second choice C2; and 

• the ratio of the expected number of light verbs 
for the first and second choices. 

A second system was implemented based on the ba- 
sic model of section 3.1; for comparison, Table 3 
gives the results of this system. 

4.4 Discussion 

4.4.1 Analysis of results of the global 
information model 

Of the 18 examples, 13 choices of support verb match 
the corresponding one from the source text. Of the 
five cases where the chosen SV did not match the 
source verb, one was actually valid: harm had cause 
as the proposed alternative. This is an equally plau- 
sible support verb, and in any case, do was the sec- 
ond choice by only a small margin. This is true 
for a number of cases: where there is an alterna- 
tive support verb to the one used in the source text, 
the second alternative represents another plausible 
choice, and the frequency ratio margin is small (for 
example, for change and resemblance) . 

In three cases lack of data is aproblem, resulting in 
the three N/A values in Table |j there are no occur- 
rences of snooze or shove as direct objects of verbs 
in Grolier's, most probably because they belong to 
a more informal register than that used in encyclo- 
pedias. Similarly, have a drink is an informal phrase 
that would not normally be found in an encyclope- 
dia, as evidenced by there being only one occurrence 
of a governing verb for drink. 
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make an attempt 


attempt 


make 


include 


9.36 


Dras, Dale (1995) 


make a change 


change 


make 


produce 


1.85 


Dras, Dale (1995) 


make a concession 


concede 


make 


include 


11.47 


Dras, Dale (1995) 


make a demand 


demand 


make 


create 


1.03 


Gref., Teufel (1995) 


make a distinction 


distinguish 


make 


have 


3.04 


Meteer (1991) 


have a drink (of) 


drink 


become 
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N/A 


N/A 


Wierzbicka (1982) 


have an effect (on) 


affect 


have 


produce 


3.04 


Dras, Dale (1995) 


have a feeling 


feel 


have 


produce 


3.27 


Harris (1957) 


make a gift (of) 


give 


have 


include 


9.89 


Harris (1957) 


do harm (to) 


harm 


cause 


do 


1.26 


Huddleston (1968) 


make a judgment 


judge 


make 


have 


2.43 


Dras, Dale (1995) 


have a knowledge (of) 


know 


have 


use 


12.36 


Kane (1983) 


make progress 


progress 


make 


allow 


64.33 


Harris (1957) 


make a proposal 


propose 


make 


include 


1.10 


Gref., Teufel (1995) 


bear a resemblance (to) 


resemble 


bear 


have 


2.64 


Huddleston (1968) 


give a shove (to) 


shove 


N/A 


N/A 


N/A 


Harris (1957) 


have a snooze 


snooze 


N/A 


N/A 


N/A 


Harris (1957) 


make use (of) 


use 


make 


have 


6.55 


Dras, Dale (1995) 



Tabic 2: Support verb candidates chosen by the system 



Source Text 


Verb 


SV'(j) 


make an attempt 


attempt 


make 


make a change 


change 


undergo 


make a concession 


concede 


make 


make a demand 


demand 


meet 


make a distinction 


distinguish 


make 


have a drink (of) 


drink 


become 


have an effect (on) 


affect 


have 


have a feeling 


feel 


express 


make a gift (of) 


give 


have 


do harm (to) 


harm 


cause 


make a judgment 


judge 


make 


have a knowledge (of) 


know 


have 


make progress 


progress 


make 


make a proposal 


propose 


reject 


bear a resemblance (to) 


resemble 


bear 


give a shove (to) 


shove 


N/A 


have a snooze 


snooze 


N/A 


make use (of) 


use 


make 



Tabic 3: Support verb candidates chosen under the basic model 



The system's worst performance was with the nomi- 
nalisation gift. This appears to have occurred as gift 
is frequently concretised, as in She has a great gift 
which has astounded her teachers or This deal in- 
cludes a free gift! However, only one of the 18 cases 
appears to be affected in this way. 

4.4.2 Comparison 

So, allowing alternative SVs, and disregarding the 
cases where the genres of the test data differed from 
the genre of the training (encyclopedia) data, the 
success rate is 14 of 15 (93%), using a 66Mb cor- 
pus. By comparison, Grcfcnstctte and Teufel (1995) 
achieve plausible SVs for 7 of 10 cases (70%), using a 
134Mb corpus; and using the basic model described 
in section 3 achieves plausible support verbs for 10 of 
15 cases. The higher results achieved by the method 
proposed in this paper arc statistically significant at 
the 10% and 5% levels respectively. Stronger results 
may be obtained given more test data — it is diffi- 
cult to do better than an improvement of 5% signif- 
icance with only 15 cases. Developing a larger set 
will require further work, as there is often disagree- 
ment about the validity of equivalence between SV 
constructions and full verbs. The data do suggest, 
however, that this is worthwhile, particularly when 
it is noted that the higher success rate was achieved 
with a smaller corpus. 

In general, the method seems to cover well both pro- 
ductive and idiomatic SV constructions. For exam- 
ple, have is productive in light verb constructions, 
and the high global frequency will give a relatively 
high expected lightness rate; however, it does not 
eliminate the possibility of a low-frequency verb (like 
bear) being a support verb in cases where the SV 
construction is strongly idiomatic (as in bear a re- 
semblance), where the low frequency in the standard 
population is counterbalanced by a high frequency 
in the target population. 

5 Conclusion 

In the process of calculating expected SVs, a number 
of significant assumptions were made: 

• that concretised nominals would not have a sig- 
nificant impact compared with the effect of the 
standardisation; 

• that in initially constructing the global infor- 
mation, all verbs can be taken as light; and 

• that the productivity of SVs allows the con- 
struction of a reasonable standard population. 

Notwithstanding these considerations, the experi- 
mental results demonstrate that approximating the 
conditional probabilities, by the unconditional prob- 
abilities derived from a large number of nominalisa- 
tions, provides accurate choices for support verbs for 



individual nominalisations in the test set. The accu- 
racy of the method appears to be superior to existing 
statistical methods which use only local information; 
and the method also involves only minimal develop- 
ment effort, unlike existing semantic methods. 

It is apparent that what constitutes a valid light verb 
construction depends on the genre and register of 
the text. Given that the test set was taken from 
a wide range of sources, more accurate results for 
this test set could no doubt be obtained by using a 
corpus that was more representative of general En- 
glish. Also, more accurate results might be gained 
after further iterations of this process: once the most 
likely support verb in a given local information is de- 
termined, the global information can be regenerated 
using these, rather than the assumption of universal 
lightness of verbs. Further work will look at imple- 
menting this iterative process, developing a larger 
set of test data for evaluation purposes, and extend- 
ing this method to other light constructions — light 
verbs with adjectival complements, and light nouns 
with post-modifiers. 

Also, in order to successfully carry out a process of 
disambiguation on a random text, the coverage of 
nominalisations and SVs needs to be greater; a key 
aspect of future work is expanding the set of data to 
achieve this better coverage. 
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